2 results
Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
- F. Bertolini, G. Galimberti, G. Schiavo, S. Mastrangelo, R. Di Gerlando, M. G. Strillacci, A. Bagnato, B. Portolano, L. Fontanesi
-
- Article
- Export citation
-
Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, Fst and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.
Ultra-low-density genotype panels for breed assignment of Angus and Hereford cattle
- M. M. Judge, M. M. Kelleher, J. F. Kearney, R. D. Sleator, D. P. Berry
-
- Article
- Export citation
-
Angus and Hereford beef is marketed internationally for apparent superior meat quality attributes; DNA-based breed authenticity could be a useful instrument to ensure consumer confidence on premium meat products. The objective of this study was to develop an ultra-low-density genotype panel to accurately quantify the Angus and Hereford breed proportion in biological samples. Medium-density genotypes (13 306 single nucleotide polymorphisms (SNPs)) were available on 54 703 commercial and 4042 purebred animals. The breed proportion of the commercial animals was generated from the medium-density genotypes and this estimate was regarded as the gold-standard breed composition. Ten genotype panels (100 to 1000 SNPs) were developed from the medium-density genotypes; five methods were used to identify the most informative SNPs and these included the Delta statistic, the fixation (Fst) statistic and an index of both. Breed assignment analyses were undertaken for each breed, panel density and SNP selection method separately with a programme to infer population structure using the entire 13 306 SNP panel (representing the gold-standard measure). Breed assignment was undertaken for all commercial animals (n=54 703), animals deemed to contain some proportion of Angus based on pedigree (n=5740) and animals deemed to contain some proportion of Hereford based on pedigree (n=5187). The predicted breed proportion of all animals from the lower density panels was then compared with the gold-standard breed prediction. Panel density, SNP selection method and breed all had a significant effect on the correlation of predicted and actual breed proportion. Regardless of breed, the Index method of SNP selection numerically (but not significantly) outperformed all other selection methods in accuracy (i.e. correlation and root mean square of prediction) when panel density was ⩾300 SNPs. The correlation between actual and predicted breed proportion increased as panel density increased. Using 300 SNPs (selected using the global index method), the correlation between predicted and actual breed proportion was 0.993 and 0.995 in the Angus and Hereford validation populations, respectively. When SNP panels optimised for breed prediction in one population were used to predict the breed proportion of a separate population, the correlation between predicted and actual breed proportion was 0.034 and 0.044 weaker in the Hereford and Angus populations, respectively (using the 300 SNP panel). It is necessary to include at least 300 to 400 SNPs (per breed) on genotype panels to accurately predict breed proportion from biological samples.